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Abstract 

Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such 
failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; 
and local codes, which minimize the number of nodes participating in the repair process. This paper focuses on 
regenerating codes with locality, using pre-coding based on Gabidulin codes, and presents constructions that utilize 
minimum bandwidth regenerating (MBR) local codes. The constructions achieve maximum resilience (i.e., optimal 
minimum distance) and have maximum capacity (i.e., maximum rate). Finally, the same pre-coding mechanism 
can be combined with a subclass of fractional-repetition codes to enable maximum resilience and repair-by-transfer 
simultaneously. 

I. Background 

A. Vector Codes 

An [n, K, dmm, «] vector code over a field is a code C of block length n, having a symbol alphabet for 
some a > I, satisfying the additional property that given c, c' G C and a, 6 G ¥g, ac + be' also belongs to C. As a 
vector space over Fg, C has dimension K, termed the scalar dimension (equivalently, the file size) of the code and 
as a code over the alphabet F°, the code has minimum distance dmin- 

Associated with the vector code C is an Fg-linear scalar code C^'^^ of length N = na, where C^*-* is obtained 
by expanding each vector symbol within a codeword into a scalar symbols (in some prescribed order). Given a 
generator matrix G for the scalar code C^^\ the first code symbol in the vector code is naturally associated with 
the first a columns of G etc. We will refer to the collection of a columns of G associated with the i"^ code symbol 
Cj as the thick column and to avoid confusion, the columns of G themselves as thin columns. 

B. Locality in Vector Codes 

Let C be an [n, K, dmin, a] vector code over a field F^, possessing a {K x na) generator matrix G. The code 
symbol, Cj, is said to have {r,6) locality, 5 > 2, if there exists a punctured code Cj := C\s, of C (called a local 
code) with support Si C {1, 2, • • • ,n} such that 

• i ^ Si, 

• \Si\ < riL ■= r -\- 6 — 1, and 

• dmin (els'.) > 

The code C is said to have (r, 5) information locality if there exists I code symbols with (r, 6) locality and 
respective support sets {Si}\^i satisfying 

. Rank(G|uusJ = i^- 

The code C is said to have (r, 6) all-symbol locality if all code symbols have (r, 6) locality. A code with (r, 6) 
information (respectively, all-symbol) locality is said to have full (r, 6) information (respectively, all-symbol) locality, 
if all local codes have parameters given by \Si\ = r + 6 — 1 and d^i^ (Ci) = S, for i = 1, • • • ,1. 

The concept of locality for scalar codes, with 6 = 2, was introduced in fV\ and extended in f2\ and f3\ to scalar 
codes with arbitrary 6, and vector codes with 6 = 2 respectively. This was further extended to vector codes with 
arbitrary 5 in IS and lO, where, in addition to constructions of vector codes with locality, authors derive minimum 
distance upper bounds and also consider settings in which the local codes have regeneration properties. 
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Consider now a vector code C with full (r, 6) locality whose associated local codes Cj have parameters [rii, K^, 6]. 
In this paper, we are interested in local codes that have the uniform rank accumulation property, in particular, local 
MBR codes and local fractional-repetition codes. 

Definition 1 (Uniform rank accumulation (URA) codes). Let G be a generator matrix for a code C, and Si be an 

arbitrary subset of i thick columns of G, for some i = 1, • • • ,n. Then, C is an URA code, if the restriction of 
G to Si, has rank pi that is independent of the specific subset Si of i indices chosen and given by pi = X]j=i 
for some set of non-negative integers {aj}. 

We will refer to the sequence {aj, 1 < z < n} as the rank accumulation profile of the code C. 

We now present the minimum distance upper bound given in fT] for the case when local codes are URA codes. 



Consider the finite length vector (a 1,02, 



), and its extension to a periodic semi-infinite sequence {aj} 
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1 < i < riL, j>l- Let P{-) denote the sequence of partial sums. 



p(») 
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(1) 



Then, given integers ni > 0, 1 < no < ni, P{uinL + uq) = uiKl + P{uo). Next, let us define the function 
by setting P('"^)(z^), for > 1, to be the smallest integer s such that P{s) > v. It can be verified that for 

t-i > 0, 1 < i;o < Kl, 



P^"''\viKl + 



nL + P^^'\vo), 



where p('"'')(fo) < r as I < vq < Kl- 

The minimum distance of a code C whose local codes Ci are URA codes can be bounded as follows. 

Theorem I.l (Theorem 5.1 of Q). The minimum distance of C is upper bounded by 



< n-P(™')(K) + l. 



(2) 



The codes achieving the bound in Q are referred to as codes having optimal locality. For such locality optimal 
codes, one can then analyze whether the code allows for efficient data storage in DSS. Towards this end, file size 
bound for codes with locality are given in [5| using the min-cut techniques similar to that of lU. As noted in lH, 
when URA codes are used as local codes, the file size bound for dmin-optimal codes can be represented in the 
form 



K < 



P(n- 
n 



mm ~l~ 1) 
'^min ~l~ 1 



where Iq G {1, • " ' > ^l} is such that 



n 



riL 



n 



I Kl + P{Io), 



(3) 



l]nL + lo- 



We note that ^(^o) = P{r), for r < Iq. 



C. MBR Codes 

An ((n, k, d), (a, /?), K) minimum-bandwidth regenerating (MBR) code is an [n, K, d^in = n — k + l,a\ vector 
code satisfying additional constraints described below. The code is intended to be used in a distributed storage 
network in which each code symbol is stored within a distinct node. The code is structured in such a way that 
the entire file can be recovered by processing the contents of any k, I < k < n nodes. Further, in case of a 
single node failure, the replacement node can reconstruct the data stored in the failed node by connecting to any 
d, k < d < n — 1, nodes and downloading P = ^ symbols from each node. The scalar dimension (or file size) 
parameter K can be expressed in terms of the other parameters as: 




as proved in ||6l . A cut-set bound derived from network coding shows us that the file size cannot be any larger, and 
thus, MBR codes are example of regenerating codes that are optimal with respect to file size. A regenerating code 
is said to be exact if the replacement of a failed node stores the same data as did the failed node, and functional 
otherwise. We are concerned here only with exact-repair codes. Constructions of MBR codes for all A; < d = a < n 
and /3 = 1 are presented in |7]. MBR codes with repair by transfer and d = n — 1 are presented in El. 

It can be inferred from the results in |8| that MBR codes are URA codes. In particular, for an ((n, k, d), (a, K) 
MBR code, the rank accumulation profile is given by 

a-ij- l<j<k 

0, k + l<j<n. ^ ' 

D. MBR-Local Codes 

Let C be an [n, K, dmim a] vector code with 

• full (r, (5)-information locality with 6 > 2, and 

• all of whose associated local codes Ci,i G £ are MBR codes with identical parameters {{riL = r + 6 — 
l,r,d),{a,^),KL). 

Then the dimension of each local code is given by 

Kl = = ra-rV (5) 

where {oj, I < i < ni,} is the rank accumulation profile of the MBR code C. 

1) Minimum distance bound for MBR-Local Codes: As MBR codes are URA codes, from Theorem |I.1[ we have 

dmin<n-P('"^)(iC) + l, (6) 

where, for MBR codes we have 

P^'^'^viKl + vo) = viUL + ly (7) 

for some vi > 0, 1 < vq < Kl, and is uniquely determined from a(i^ — 1) — C^2^) < vq < ai^ — (^). 

2) File size bound for MBR-Local Codes: From ([3]), the file size bound for an optimal locality code with MBR 
local codes is given by 



K < 



nL 



1 ) i^L + - ( ^ ) , (8) 



where Iq is as defined in Subsection I-B and fi = mm{lQ,r}, which follows from the rank accumulation profile 
of MBR codes, i.e., from Q. 

E. Linearized Polynomials 

A polynomial /(x) over the field ¥qm, is said to be linearized of (/-degree t, if 

t 

fix) = Uix'i' , u, G Fg™, ut / 0. (9) 

i=0 

A linearized polynomial /(x) over F^m satisfies the following property ||9]|: 

/(Ai01 + A2^2) = Ai/(0i) + A2/(02) 

V 01,02 eFg„., Ai,A2GFg. (10) 

A linearized polynomial f{x) over F^™ of g-degree t, m > t, is uniquely determined from its evaluation at a set 
of {t + 1) points gi, - ■ ■ ,gt+i G F^™, that are linearly independent over F^. 



F. Gabidulin Maximum Rank Distance Codes 

Now, we present a construction of maximum rank distance codes, provided by Gabidulin in |[TOl . This codes can 
be viewed as a rank-metric analog of Reed-Solomon codes. 

The rank of a vector v G F^, denoted by rank(v) is defined as the rank of the m x M matrix V over Fg, 
obtained by expansion of every entry of v to a column vector in F™, based on the isomorphism between Fq>" and 
F™. Similarly, for two vectors v, u G F^, the rank distance is defined by (i^(v, u) = rank(V — U). 

An [M,IC,V]g.. rank-metric code C C F^. is a linear block code over F^™ of length J\f, dimension /C and 
minimum rank distance V. A rank-metric code that attains the Singleton bound V<M — JC + lin rank-metric is 
called a maximum rank distance (MRD) code. For m > M, a construction of MRD codes, called Gabidulin codes 
is given as follows lITOl . 

A codeword in an [M ,K.,V = J\f - K. + GabiduUn code C'^'^^, m > TV, is defined as 

c = (m),/(02),...,/(%)) e F^, (11) 

where f{x) is a linearized polynomial over Wq^ of q-degree /C — 1 with the coefficients given by the information 
message, and where the ^i, . . . , 9j\f S F^™ are linearly independent over Fg lilOl . 

II. Construction of Codes with MBR Locality 

In this section, we will present two constructions of codes with local regeneration. In both cases, the local 
codes are MBR codes with identical parameters and both codes are optimal, i.e., they achieve the upper bound of 
Theorem [TT] on minimum distance. The first construction is an all-symbol locality construction, while the second 
has information locality. 

The constructions presented in this paper, adopt the linearized polynomial approach made use of in ifTTI . |[T2ll . 
||5i . In particular, similar to the constructions proposed in ||T2]| . 1511 . the constructions of this paper have a two-step 
encoding process with the first step utilizing Gabidulin codes, which in turn, are based on linearized polynomials. 
The first code construction given below also proves the tightness of the bound on minimum distance of codes with 
URA derived in ||4l (Theorem 5.1) for the case when Kl \ K, where is the scalar dimension of the local MBR 
code. 

Consider a code Cbasic that is simply the concatenation of t local MBR codes having identical parameters 

{{nL,k,d), {a, (3), Kl). Thus a typical codeword c G Cbasic looks like 



,mbr „mbr „mbr 
-1 "-2 ' ' ' '"t 



where each vector c™^'' is a codeword belonging to the MBR code. The generator matrix Gbask of the code will 
clearly, have a block-diagonal structure. It is straightforward to show that the smallest number p, such that any p 
thick columns of Gbasic have rank > K is given by for any 1 < K < tKi. 

Construction II.1. We will describe the construction by showing how encoding of a message vector takes place. 
The encoding is illustrated in Fig. 7 Given the message vector u G F^, we first encode u to a tK^ long 
Gabidulin codeword using tK^ linearly independent points (over ¥q) {9i, 62, ... , OiKl} "^q'"' i-^> by applying an 
[IKl, K,tKL — K -\- l]qm Gabidulin code, assuming m > tK^. We then partition tKi symbols of the Gabidulin 
codeword, {f{9i), f{62), ■ ■ ■ , f{GtKL)), into t disjoint sets of Kl symbols each. Each of these sets is then fed in 
as a message vector to a bank of t identical MBR encoders whose outputs constitute {{nL,r, d), (a, (3), Kl) MBR 
codes. If {c™^"" \ i = 1,2, ... is the resulting set oft codewords, then these are then concatenated to obtain the 
desired codeword c. The code C thus constructed has: 

• length n = triL 

• t local {{riL, r, d), (a, Kl) MBR codes with disjoint supports 

• full (r, S) MBR all-symbol locality where 5 is defined from ul = r + 5 — 1. 



Theorem II.2. Given any set of parameters n,r,S,K, such that n = tuL and K < IKl, the construction II. 1 
yields an optimal MBR-local code with full (r, 6) all-symbol locality whose minimum distance is given by 



dmin = n - P<""'>{K) + 1. 
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Fig. 1. Illustrating the two-step construction of the all-symbol MBR-local code. 



We first present a useful lemma on codes that are obtained by concatenating a Gabidulin code (over F^m) with 
a vector code (over Fg). 

Lemma II.3. Let G be the generator matrix of an [n, J, dmim ct] vector code over the field Fg. Let J be an integer 
such that J < J. Let p be the smallest integer, such that the submatrix of G obtained by selecting any p thick 
columns of G results in a matrix of rank > J. Let 

J-i 

be a linearized polynomial of q- degree at most J — 1 over the extension field F^™, for m > J. Let {6i}^^i be any 
collection of J elements of Fg™ that are linearly independent over ¥q. The mapping 

iuo,ui,--- ,nj_,) ^ ifiei)Ji92),--- ,fi0j))G 

defines a linear code C over F^™ having message vector (uq, ui, - ■ ■ , Uj_j^). Then C has minimum distance I?min 
given by 

Drain = n - p + I, 

i.e., C is an [n, J, I?min] code over F^™. 

Proof: Since /(•) is linearized, we can interchange linear operations with the operation of evaluation: 

ifiOi) fie^) ■ ■ ■ f{ej))G = f m 02- ■■ e.j)G) . 

We have extended here the definition of /(•) to vectors through termwise application. Consider next, the matrix 
product 

r := [0ie2---ej]G. 

In writing this, we have abused notation and identified elements in F^™ with their representations as vectors over 
¥q lying in F™. The m x J matrix [^i 62 - ■ ■ Oj] on the left has the property that all of its columns are linearly 
independent. Hence linear dependence relations amongst columns of F are precisely those inherited from the matrix 
G. It follows that p is also the smallest number, such that any p thick columns of the product matrix F have rank 
> J. Since /(•) is uniquely determined by its evaluation at a collection of J linearly independent vectors lying in 
F™, it follows that the maximum number of erasures that the code C can recover from is given hy n — p. It follows 
that 



-Dmin = n- p + 1. 



Proof: (of Thm. |II.2[) Let Gbasic be the generator matrix of the code that is simply the disjoint union of the 



t MBR codes. As it was explained previously, the smallest number p of thick columns of G basic such that any p 



columns of Gbasic have rank > K is, given by It follows therefore from Lemma n.3 (by substituting 

J = K, J = tKi and also assuming that Gbasic is over Fg) that the code has minimum distance given by 

dmin = n - P('"^'(i^) + 1, 

hence the code attains the bound of Theorem |L1[ and thus, optimal. ■ 

Remark 1. We note that whenever K = viKl + uq, f i > 0, 1 < vq < Kl is such that vq = ua — (2) for some 
1 < v < r, then the code constructed by Construction 11. 1 has maximum possible scalar dimension given in ([8]l. 



This observation holds for the code we will construct using Construction II.4 as well. 



Construction II.4. We describe here a method by which we construct a code of length n = triL + A, with 
(r, 6) information locality for scalar dimension K < tK^. Given the message vector u S F^, we first encode 
u to a tKi + Aa long Gabidulin codeword using a [tK^ + Aa, K, tK^ + Aa — K + 1]^™ Gabidulin code, for 
m > tK^ + Aa. We then divide the first tKi symbols of the Gabidulin codeword into t disjoint groups of equal 
size and encode each of these t groups using an ((n^, r, d), {a, (3), Kl) MBR code (similar to the second step of 



encoding in Construction \II.1 1. This gives us a code of length tni with MBR all-symbol locality, whose elements 
are {cf^^^ |i = l,2,...,t}. We then partition the remaining Aa symbols of the Gabidulin codeword into A equal 
sets and denote the ith set by Ctn^+i- The construction outputs {cf^^ , . . . , cf^^ , Cm^+i, ■ ■ ■ , Cfnt+A) <^s a final 
codeword. The resultant vector code C has: 

• Length n = triL + A 

• t local {{riL, r, d), (a, /?), Ki) MBR codes with disjoint support 

• full (r, 6) MBR information-symbol locality 



Theorem II.5. Given any set of parameters n,r,5,K, such that n = tni -\- A and K < tKi, Construction II. 1 
results in an optimal MBR-local code with (r, 6) information locality whose minimum distance is given by 

drain = n - P''"">iK) + 1. 



Proof: The proof follows along the same lines as the proof of Theorem 11.2 



in. Fractional-Repetition Codes as Local Codes 



In this section we discuss the usage of fractional repetition (FR) codes as local codes in Constructions II. 1 



and II.4 FR codes can be viewed as a generalization of repair-by-transfer MBR codes, where a repair process is 
uncoded and table-based, i.e. FR codes have a "repair-by-transfer" property, while only specific sets of nodes of size 
d participate in a node repair process. For the sake of completeness, we provide an overview of the t-design-based 
construction for FR codes presented in |13Q 

Let t,n,w,\ be integers with n > w > t and A > 0. A t-{n,w,X) design is a collection B of w-subsets, 
(the blocks), of an n-set X (the points), such that every t- subset of X is contained in exactly A blocks. Let 
xi, . . . , xt G be a set of t points. We denote by As the number of blocks containing xi, . . . , Xg, 1 < s < t. Then, 

\ _ \ \t~sl 
— ^ ' 

\t-s) 

the number of blocks in the t-design is b = Xq = A(") / C^); and each point in X is contained in Ai blocks where 

Ai=Ari)/(ri) 0- 

Construction III.l. Let Bi, . . . , Bi, ^ B be the blocks and xi, . . . ,Xn & X the points of a t-{n, w, A) design. Then 
the n nodes of a FR code C are given by the points of the design, i.e., a node Ni contains a = Ai symbols given 
by Ni = {j : G Bj}. Note that the cardinality of an intersection of any s <t nodes are given by the numbers A^, 



'The construction in 1 13] sets t — 2 and A = 1; and the corresponding codes are called transposed codes. 
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Fig. 3. Fano Plane. 



and hence the cardinality of a union of any s < t nodes can be easily derived by the inclusion-exclusion formula. 
Let k,K be two integers such that k < t and 

k-l k 

I IJ A^il <i^< lU^^^I- (12) 

i=l i=l 

Then we have an FR code over an alphabet of size h, with the property that there exists a set of d nodes which 
can repair a failed node and from any set of k nodes one can reconstruct the original K symbols. 

Given a message vector [mi m2 ■ ■ ■ rnx], we encode the message symbols first by using an \b,K,h — K + 1] 
MDS code to produce h coded symbols (ci, C2 • • • Ch) and then by employing the FR code based on the t-design 
to produce n nodes each containing Ai symbols. 

This family of FR codes based on t-designs is also an example of codes with uniform rank accumulation, and 
thus the bound of Theorem [TT] can be used here as well. Thus, we have the following result. 

Theorem III.2. When FR codes based on a t-design obtained by Construction are used as the local codes 



in Constructions II. 1 and II.4 then the resulting code with locality attains the bound of Theorem I.I 
distance. 



on minimum 



An example of an encoding is shown in Fig. [2j where the encoding is done using 2-(7, 3, 1) design, also known 
as the Fano plane (see Fig. [3]). When we replace a local MBR code with the FR code based on Fano plane in 
Fig. [1} we obtain a code with locality which has the optimal minimum distance. 
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